A Typology of Near-Identity Relations for Coreference (NIDENT)
نویسندگان
چکیده
The task of coreference resolution requires people or systems to decide when two referring expressions refer to the ‘same’ entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of ‘near-identity’, a middle ground category between identity and non-identity, to handle such cases systematically. We present a typology of Near-Identity Relations (NIDENT) that includes fifteen types—grouped under four main families—that capture a wide range of ways in which (near-)coreference relations hold between discourse entities. We validate the theoretical model by annotating a small sample of real data and showing that inter-annotator agreement is high enough for stability (K= 0.58, and up to K= 0.65 and K= 0.84 when leaving out one and two outliers, respectively). This work enables subsequent creation of the first internally consistent language resource of this type through larger annotation efforts.
منابع مشابه
Experiments on bridging across languages and genres
In this paper, we introduce a typology of bridging relations applicable to multiple languages and genres. After discussing our annotation guidelines, we describe annotation experiments on the German part of our parallel coreference corpus and show that our interannotator agreement results are reliable, considering both antecedent selection and relation assignment. In order to validate our theor...
متن کاملAnnotating Near-Identity from Coreference Disagreements
We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al. (2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introduci...
متن کاملAnnotating Extended Textual Coreference and Bridging Relations in the Prague Dependency Treebank
This technical report describes the project of manual annotation of extended textual coreference and bridging relations, which runs at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, since 2009. It contains the typology of coreference and bridging relations, classification of elements that are annotated for coreference and the a...
متن کاملAn Evaluation of Inter-Annotator Agreement in the Observation of Anaphoric and Referential Relations
When proposing a description of the data he observes, the linguist must make sure that his observations may be also regularly made by other persons. In this paper, we introduce a typology of anaphoric and referential relations and an experiment which aims at assessing that this typology is operational. Given three newspaper articles, five students were asked to identify anaphoric and/or referen...
متن کاملSemantic Approach to Identity in Coreference Resolution Task
It has been recently discussed in linguistics that the notion of identity in the task of coreference resolution is of continuous nature, ranging from “complete” identity to non-identity. The current paper confronts this idea with experimental data for Polish, resulting in a new approach to the notion of identity. It extends the definition of coreference with speaker/recipient relation, believed...
متن کامل